Inter - Word Constraints in Visual
نویسنده
چکیده
A mcthod of using knowledgc about constrainl.~ be/weell words in a technique for reading a running tcxt is presented. This knowledgc IS reprC!itnted as a set of allowablc transitions (c.alled ~inter-word constraint.<;") and a method is given for incorporating this knowledge rcpresentation in a system for rcading visual imag~ of text. This system first anal,vses the shape of a word image to sugg~t a group or neighborhood of words in a dictionary (list of words) that contains an input word. The inter-word constraints are then used to reduct the sil.e of the neighborhood and the smaller neighborhood IS used to 'direct further detailed analysi~ of the input. Thi~ Pf('i\.'C!iS results in a match of the input image to llOe of the words 10 the neighborhood. R~ult.<; are reported in thi'l paper on Ih~ performance and COSt of two reprC!itntations lor inter-word ("nstrain!..'i. The potential of thC!it kno\\. ledgc S()urc~ tt> redu(c the neighborhood Sil.e is explored in a seTies of statislIC<l1 c\perimcnt'i. It is shown that the average size of a nCighborhood em;uulltcred when a tC);t of 150,(XlO words is -readon a word by word halois is redu~ from 16 to about 2. It is also shown empiric.ally that the memory' rieeded for both knowledge representations grows linearly with dictionary ,size and the total additional memory requirement is about twice that needed fOT the original dictionary •.
منابع مشابه
Integration of Visual Inter-Word Constraints and Linguistic Knowledge in Degraded Text Recognition
Degraded text recognition is a di cult task. Given a noisy text image, a word recognizer can be applied to generate several candidates for each word image. Highlevel knowledge sources can then be used to select a decision from the candidate set for each word image. In this paper, we propose that visual inter-word constraints can be used to facilitate candidate selection. Visual inter-word const...
متن کاملCharacter segmentation using visual interword constraints in a text page
Character segmentation is a critical preprocessing step for text recognition. In this paper a method is presented that utilizes visual inter-word constraints available in a text image to split word images into smaller image pieces. This method is applicable to machine-printed texts in which the same spacing is always used between identical pairs of characters. The visual inter-word constraints ...
متن کاملDegraded Text Recognition Using Word Collocation and Visual Inter-Word Constraints
Given a noisy text page, a word recognizer can generate a set of candidates for each word image. A relaxation algorithm was proposed previously by the authors that uses word collocation statistics to select the candidate for each word that has the highest probability of being the correct decision. Because word collocation is a local constraint and collocation data trained from corpora are usual...
متن کاملAlgorithms for postprocessing OCR results with visual inter-word constraints
Algorithms are presented that determine the visual relationships between word images in a document. These include instances of common word images and common substrings that occur often in English language text images. This information is then be used to improve the performance of a commercial optical character recognition (OCR) algorithm. The algorithms presented here calculate clusters of equi...
متن کاملLearning Structured Semantic Embeddings for Visual Recognition
Numerous embedding models have been recently explored to incorporate semantic knowledge into visual recognition. Existing methods typically focus on minimizing the distance between the corresponding images and texts in the embedding space but do not explicitly optimize the underlying structure. Our key observation is that modeling the pairwise image-image relationship improves the discriminatio...
متن کامل